Approximating Shortest Superstring Problem Using de Bruijn Graphs

نویسندگان

  • Alexander Golovnev
  • Alexander S. Kulikov
  • Ivan Mihajlin
چکیده

The best known approximation ratio for the shortest superstring problem is 2 11 23 (Mucha, 2012). In this note, we improve this bound for the case when the length of all input strings is equal to r, for r ≤ 7. E.g., for strings of length 3 we get a 1 1 3 -approximation. An advantage of the algorithm is that it is extremely simple both to implement and to analyze. Another advantage is that it is based on de Bruijn graphs. Such graphs are widely used in genome assembly (one of the most important practical applications of the shortest common superstring problem). At the same time these graphs have only a few applications in theoretical investigations of the shortest superstring problem.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Approximating the Shortest Superstring Problem Using de Bruijn Graphs

The best known approximation ratio for the shortest superstring problem is 2 11 23 (Mucha, 2012). In this note, we improve this bound for the case when the length of all input strings is equal to r, for r ≤ 7. For example, for strings of length 3 we get a 1 1 3 -approximation. An advantage of the algorithm is that it is extremely simple both to implement and to analyze. Another advantage is tha...

متن کامل

Solving 3-Superstring in 3 Time

In the shortest common superstring problem (SCS) one is given a set s1, . . . , sn of n strings and the goal is to find a shortest string containing each si as a substring. While many approximation algorithms for this problem have been developed, it is still not known whether it can be solved exactly in fewer than 2 steps. In this paper we present an algorithm that solves the special case when ...

متن کامل

An Efficient Algorithm for Chinese Postman Walk on Bi-directed de Bruijn Graphs

Sequence assembly from short reads is an important problem in biology. It is known that solving the sequence assembly problem exactly on a bi-directed de Bruijn graph or a string graph is intractable. However, finding a shortest double stranded DNA string (SDDNA) containing all the k-long words in the reads seems to be a good heuristic to get close to the original genome. This problem is equiva...

متن کامل

Routing and Transmitting Problems in de Bruijn Networks

De Bruijn graphs, both directed and undirected, have received considerable attention as architecture for interconnection networks. In this paper, we focus on undirected de Bruijn networks of radix d and dimension n, denoted by U B(d; n). We rst discuss the shortest-path routing problem. We present properties of the shortest paths between any two vertices of U B(d; n) and propose two shortest-pa...

متن کامل

Efficient Algorithms for de novo Assembly of Alternative Splicing Events from RNA-seq Data

In this thesis, we address the problem of identifying and quantifying variants (alternative splicing and genomic polymorphism) in RNA-seq data when no reference genome is available, without assembling the full transcripts. Based on the fundamental idea that each variant corresponds to a recognizable pattern, a bubble, in a de Bruijn graph constructed from the RNA-seq reads, we propose a general...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013